[RFC][ROCm][AITER] Keep all AITER kernels in _aiter_ops class like _custom_ops and _ipex_ops#24490
Merged
tjtanaa merged 60 commits intovllm-project:mainfrom Nov 10, 2025
Merged
Conversation
…e the aiter ops enability and support Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
…gics and update unit tests Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
SageMoore
approved these changes
Nov 7, 2025
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
tjtanaa
approved these changes
Nov 10, 2025
Member
|
@tjtanaa @vllmellm it seems this PR introduced mandatory imports of rocm.py, showing by default on nvidia devices. Can we move these to be lazy imports? I opened #28428 for fp8_utils.py but then saw that this is applied to more files |
This was referenced Nov 11, 2025
devpatelio
pushed a commit
to SumanthRH/vllm
that referenced
this pull request
Nov 29, 2025
…`_custom_ops` and `_ipex_ops` (vllm-project#24490) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
This PR introduces
_aiter_ops.pyas proposed in the RFC here. Theaiter_opsnamespace provides several key benefits:Centralized kernel registration: Ensures that kernels from the aiter package are properly registered
Environment availability checks: Encapsulates aiter support detection and environment compatibility validation
Reduced code duplication: Eliminates the need for duplicate helper functions, namely checking device compability and environment varible enability checks across different vLLM modules.
This implementation establishes the foundation for future refactoring efforts, where existing kernels throughout the vLLM repository will be migrated to use this unified approach for better maintainability and consistency.
This PR uses 5ee37dce commit from
aiterrepo.Test Plan
Test models that are afftected by this change, using lm_eval on gsm8k dataset.
environment setting
Step 1: run vllm serve
VLLM_USE_V1=1 \ VLLM_ROCM_USE_AITER=1 \ SAFETENSORS_FAST_GPU=1 \ VLLM_DISABLE_COMPILE_CACHE=1 \ vllm serve $MODEL_NAME --compilation-config '{"cudagraph_mode": "FULL_AND_PIECEWISE", "cudagraph_capture_sizes": [1,2,4,8,16,24,32]}' --trust-remote-code --swap-space 16 --distributed-executor-backend mpStep 2: run lm_eval
lm_eval --model local-completions --tasks gsm8k --model_args model=$MODEL_NAME,base_url=http://localhost:8000/v1/completions --trust_remote_code --num_fewshot 5 --batch_size 256Test Results
deepseek-ai/DeepSeek-V3 -tp 8 --block-size 1 --max-model-len 32768 --max_seq_len_to_capture 32768
meta-llama/Llama-4-Scout-17B-16E-Instruct -tp 8 --max-model-len 8192
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 -tp 8 --max-model-len 8192
serve Qwen/Qwen3-235B-A22B-FP8 -tp 4
mistralai/Mixtral-8x7B-Instruct-v0.1 -tp 2
mistralai/Mixtral-8x7B-Instruct-v0.1 -tp 2 --quantization fp8
meta-llama/Meta-Llama-3.3-70B-Instruct -tp 2
meta-llama/Meta-Llama-3.3-70B-Instruct -tp 2 --quantization fp8
amd/Llama-3.3-70B-Instruct-FP8-KV -tp 2
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.